Low-Level Monitoring and High-Level Tuning of UPC on CC-NUMA Architectures

نویسنده

  • Ahmed S. Mohamed
چکیده

We experiment with various techniques of monitoring and tuning UPC programs while porting NAS NPB benchmark using the recently developed GCC-SGI UPC compiler on the Origin O3800 NUMA machine. The performance of the NAS NPB on the SGI NUMA environment is compared to previous NAS NPB statistics on a Compaq multiprocessor. In fact, the SGI NUMA environment has provided new opportunities for UPC. For example, the spectrum of performance analysis and profiler tools within the SGI NUMA environment made the development of new monitoring and tuning strategies that aim at improving the efficiency of parallel UPC applications possible. Our objective is to be able to project the physically monitored parameters back to the data structures and high-level program constructs within the source code. This increases a programmer’s ability to effectively understand, develop, and optimize programs; enabling an exact analysis of a program’s data and code layouts. Using this visualized information, programmers are able to further optimize UPC programs with a better data and threads layouts potentially resulting in significant performance improvements. Furthermore, the SGI CC-NUMA environment provided memory consistency optimizations to mask the latency of remote accesses, convert aggregate accesses into more efficient bulk operations, and cache data locally. UPC allows programmers to specify memory accesses with "relaxed" consistency semantics. These explicit consistency "hints" are exploited by the CC-NUMA environment very effectively to hide latency and reduce coherence overheads further by allowing, for example, two or more processors to modify their local copies of shared data concurrently and merging modifications at synchronization operations. This characteristic alleviates the effect of false sharing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Monitoring and Evaluation of a UPC Implementation on a NUMA Architecture

UPC is an explicit parallel extension of ANSI C, which has been gaining rising attention from vendors and users. In this work, we consider the low-level monitoring and experimental performance evaluation of a new implementation of the UPC compiler on the SGI Origin family of NUMA architectures. These systems offer many opportunities for the high-performance implantation of UPC. They also offer,...

متن کامل

Shared Memory Multiprocessor Architectures for Software IP Routers

In this paper, we propose new shared memory multiprocessor architectures and evaluate their performance for future Internet Protocol (IP) routers based on Symmetric Multi-Processor (SMP) and Cache Coherent Non-Uniform Memory Access (CC-NUMA) paradigms. We also propose a benchmark application suite, RouterBench, which consists of four categories of applications representing key functions on the ...

متن کامل

ASCOMA: An Adaptive Hybrid Shared Memory Architecture

Scalable shared memory multiprocessors traditionally use either a cache coherent non uniform memory access CC NUMA or simple cache only memory architecture S COMA memory architecture Recently hybrid architectures that combine aspects of both CC NUMA and S COMA have emerged In this paper we present two improvements over other hybrid architectures The rst improvement is a page allocation algorith...

متن کامل

A Tool Environment for Efficient Execution of Shared Memory Programs on NUMA Systems

One of the most important performance issues on NUMA systems is data locality since remote memory accesses have latencies several magnitudes higher than local memory accesses. This paper presents a tool environment targeting at tuning NUMA-based shared memory applications towards better memory locality. This tool environment comprises tools, supporting system facilities, and their interface. To...

متن کامل

Implementing a Global Address Space Language on the Cray X1: the Berkeley UPC Experience

The Berkeley UPC Compiler is an open source, high performance and portable implementation of Unified Parallel C (UPC), an SPMD global-address space language extension of ISO C. In previous work, we have experimented our compiler on a variety of high-performance networks and parallel architectures, including distributed memory machines and clusters of SMPs. Our goal in this paper is to implement...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003